Fast Large-Scale Approximate Graph Construction for NLP

نویسندگان

  • Amit Goyal
  • Hal Daumé
  • Raul Guerra
چکیده

Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FLAG: Fast Large-Scale Graph Construction for NLP

Many natural language processing (NLP) problems involve constructing large nearest-neighbor graphs between word pairs by computing distributional similarity between word pairs from large corpora. In this paper, first we describe a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data in memory and time efficient manner, FLAG maintains...

متن کامل

Large-scale Nonlinear Programming: An Integrating Framework for Enterprise-Wide Dynamic Optimization

Integration of real-time optimization and control with higher level decision making (scheduling and planning) is an essential goal for profitable operation in a highly competitive environment. While integrated large-scale optimization models have been formulated for this task, their size and complexity remains a challenge to many available optimization solvers. On the other hand, recent develop...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Fast kNN Graph Construction with Locality Sensitive Hashing

The k nearest neighbors (kNN) graph, perhaps the most popular graph in machine learning, plays an essential role for graphbased learning methods. Despite its many elegant properties, the brute force kNN graph construction method has computational complexity of O(n), which is prohibitive for large scale data sets. In this paper, based on the divide-and-conquer strategy, we propose an efficient a...

متن کامل

CONSTRAINED BIG BANG-BIG CRUNCH ALGORITHM FOR OPTIMAL SOLUTION OF LARGE SCALE RESERVOIR OPERATION PROBLEM

A constrained version of the Big Bang-Big Crunch algorithm for the efficient solution of the optimal reservoir operation problems is proposed in this paper. Big Bang-Big Crunch (BB-BC) algorithm is a new meta-heuristic population-based algorithm that relies on one of the theories of the evolution of universe namely, the Big Bang and Big Crunch theory. An improved formulation of the algorithm na...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012